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ABSTRACT 

Change detection within an audio stream is an important task 
in several domains, such as classification and segmentation 
of a sound or of a music piece, as well as indexing of broad- 
cast news or surveillance applications. In this paper we pro- 
pose two novel methods for spectral change detection without 
any assumption about the input sound: they are both based 
on the evaluation of information measures applied to a time- 
frequency representation of the signal, and in particular to the 
spectrogram. The class of measures we consider, the Renyi 
entropies, are obtained by extending the Shannon entropy def- 
inition: a biasing of the spectrogram coefficients is realized 
through the dependence of such measures on a parameter, 
which allows refined results compared to those obtained with 
standard divergences. These methods provide a low compu- 
tational cost and are well-suited as a support for higher level 
analysis, segmentation and classification algorithms. 

Index Terms — Change detection, spectral entropy, Kull- 
back divergence, Renyi entropies, segmentation 

1. INTRODUCTION 

The detection of spectral changes within an audio signal can 
be performed according to many different criteria, depend- 
ing on the applications; the key point is what kind of spectral 
change has to be considered significant. A typical problem 
in audio classification is to identify signal segments with dif- 
ferent contents, for example when analyzing a radio stream 
to separate speech, music or mix of them; another type of 
problem is speaker change detection, which typically occurs 
when indexing audio recording of conferences, interviews or 
lectures. In either case we have to perform a segmentation 
and a classification, but the interesting spectral changes are 
completely different. The point of view we consider is at 
the signal level, since our research is about adaptive resolu- 
tion methods for analysis, transformation and re-synthesis of 
a sound. 

The use of information measures to evaluate the features 
of a time-frequency representation of a signal is frequent in 
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the literature: Shannon entropy is applied to evaluate the con- 
centration of the representation seen as a probability distribu- 
tion, and the derived divergence measures [1] are employed 
to identify variations within the representation. 

The representation we consider is the spectrogram of 
the signal: through a normalization which gives a unitary 
sum, we consider the discrete spectrogram in a finite time 
interval as a probability distribution, and we can apply typ- 
ical information measures to evaluate its concentration in 
the time-frequency plane. Fixing the signal /, we write 
PS m = {PS/[m, k], k= 1, N} to indicate the m-th anal- 
ysis frame in the discrete spectrogram PS/ of /, where the 
FFT size N is the finite number of sample frequencies con- 
sidered. Given two normalized analysis frames PSi and PS2, 
the Kullback K divergence 0D is usually employed to have 
a measure of their difference: a spectral change is detected 
whenever i^(PSi, PS2) is larger than a chosen threshold. A 
refinement of this method (see for example |2|) provides a 
better robustness to false alarms defining a mean spectrum 
PSmean and comparing its divergence with the new analysis 
frame. 

The first method we propose is a straight extension of 
the one just described: we consider the divergence measure 
derived from the Renyi entropy [3] instead of the K directed 
divergence, allowing a tuning of the detection criteria thanks 
to the dependance of the measure on a parameter. The sec- 
ond method is not based on divergence but on Renyi entropy 
itself, exploiting one of its fundamental property: the entropy 
of a union of probability distributions can be evaluated con- 
sidering the entropy values of the individual distributions. 
Since we do consider analysis frames as probability distri- 
butions, this property can be used to establish the expected 
entropy value of a certain signal segment when the following 
frame is added: if the actual value differs significantly from 
the expected one, the last frame is considered to contain a 
spectral change. 

This kind of algorithm does not need acoustic models to 
refer to, nor data training: a certain metric is evaluated in 



a given space 0). The information measures we take into 
account can be applied on several different representation of 
the signal: in [5 1 the K divergence is used in a GMM frame- 
work instead of on the spectrogram. In several approach, 
for example in [6], difference measures are calculated as a 
first step which gives a suitable analysis for segmentation and 
classification purposes: for all these algorithms, the class of 
measures we introduce could ameliorate the detection perfor- 
mances as they allow a further parameter of choice, while still 
including the K divergence for a given value of the parameter. 

In the next section we give the essential properties and 
definitions of the measures considered, then we describe the 
biasing obtained with the parameter introduced. Finally we 
present our algorithms and give some examples: we use a 
speech fragment to compare the detection with the one given 
by the K divergence measure; we take as a reference the 
segmentation given on the same signal by an HMM-based 
phoneme segmentation method [7|, and the voiced-unvoiced 
classification obtained with a PSOLA-based algorithm [8|. 
Our results are interesting as the methods provide a refined 
adjustable detection, despite of their substantial plainness and 
low computational cost. 

2. RENYI ENTROPIES AND INFORMATION 
MEASURES 

Given a finite probability density p and a rational number a > 
0, the Renyi entropy of p is defined as follows, 

1 N 

Ha[p]=, log 2 Vp«[fc], (1) 
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where p is in square brackets as we are considering the mea- 
sure on discrete densities; as a tends to one this measure con- 
verges to the Shannon entropy, which is therefore included in 
this larger class. General properties of Renyi entropies can be 
found in [3|, |9| and [ 1 1 ; in particular, H Q (P) is a non in- 
creasing function of a, so a% < 012 => H Ql (P) > H Q2 (P) . 
Moreover, for every order a the Renyi entropy H a is maxi- 
mum when P is uniformly distributed, while it is minimum 
and equal to zero when P has a single non-zero value. As we 
are working with finite discrete densities we can also consider 
the case a = which is simply the logarithm of the number 
of elements in p; as a consequence Ho [p] > H Q [p] for every 
admissible order a. Given a second finite probability density 
q of the same length, if p and q have exactly the same zeros 
the Renyi information [3] is defined as follows, 

U«.P) = ^ 2 ±0L, (2, 

and it tends to the Kullback / divergence [TJ as a tends to one. 
We can thus consider this class of measures to obtain different 



divergences as for the Kullback / one, and apply them to the 
spectrogram frames: as long as we can give an interpretation 
to the a parameter, this class of measures offers a largely more 
detailed information about the time-frequency representation 
of the signal. 

2.1. Biasing spectral coefficients through the a parameter 

To show the biasing introduced on the spectral coefficients by 
the a parameter we consider a simplified model of a spectro- 
gram composed by a variable amount of large and small coef- 
ficients. We realize a vector U of length N = 100 generating 
numbers between and 1 with a normal random distribution; 
then we consider the vectors Um, 1 < M < N such that 

f U[k] xfk<M 
U M [k] = { om ifk>M 

and then normalize to obtain a unitary sum. We then apply 
Renyi entropy measures with a varying between and 30: as 
we see from figure [TJ there is a relation between M and the 
slope of the entropy curves for the different values of a. 




Fig. 1. Renyi entropy evaluations of the Um vectors with 
varying a. 

For a = 0, H [C/m] is the logarithm of the number of 
non-zero coefficients and it is therefore constant; when a in- 
creases, we see that densities with a small amount of large 
coefficients gradually decrease their entropy. This means that 
increasing a we emphasize the difference between the en- 
tropy values of a peaky distribution and that of a nearly flat 
one. In the next section we will give an example of the ex- 
ploiting of this important property, but care should be taken 
when applying this criterium: small coefficients in a spectro- 
gram include signal components of weak amplitude as well 
as noise; choosing an extremely small a the change detection 
robustness to noise level significantly decreases. 

2.2. The entropy prediction method 

The second method we introduce is not based on a divergence 
criterium, but on entropy itself. We first give the definition 



of Renyi entropy for the case of distribution obtained with 
a discretization of their continuous version |11 1: let PS/ be 
a normalization with unitary sum of a discrete spectrogram, 
then the Renyi entropy of PS t is 

H Q [PS/] = I ^ lo S2X^ PS /[ n >fc]) Q + log2(a&) , (3) 

n,k 

where k varies between 1 and the FFT size N while n varies 
in the time interval where the evaluation has to be performed, 
according to the time grid. The term log 2 (a&) takes into 
account the time and frequency steps a and b of the lattice 
A used to sample the continuous spectrogram: this guar- 
antees the stability of the discrete entropy when changing 
the hop and the FFT sizes, as long as the sampling grid is 
dense enough in the time-frequency plane. For the entropy 
of a single analysis frame we write H Q [PS/] = HcjPSm] as 
above, where m is the time index of the analysis frame con- 
sidered; for L different analyses frames, we write H Q [PS /] = 
H Q [PS m , PS m+ L] to focus on the individual vectors. The 
following properties are straightforward by the definitions. 

Proposition 2.1 (Renyi entropy prediction). Consider a spec- 
trogram PS f and a rational number a > 0. 

(i) Let PS m be an analysis frame in PS/; ifPSk is ob- 
tained rearranging the elements o/PS m , then 

H Q [PS m ] =H Q [PS fc ] =H, (4) 

H a [PS ro ,PS fc ]=H + l. (5) 

(ii) In general, if PS m +i, PS m +£ are obtained rear- 
ranging the elements o/PS m , than 

Hq [PS m , PS m -|-£ ] =H + log 2 (i + l) . (6) 

As a rearrangement we mean a reordering of the frame co- 
efficients, thus including the case of equality between frames. 
The idea of our method is that given the entropy of a cer- 
tain signal segment H Q (PS m , PS m +i) composed by L 
contiguous frames, we can predict H a (PS m , PS m +L+i) 
supposing the new frame to be spectrally coherent and thus 
iso-entropic with the previous ones. If on the other hand the 
entropy value of the new segment largely differs from the 
predicted value, we assume the new frame to be incoherent 
with the previous and so a spectral change is detected. There 
is here a strong assumption concerning the equivalence be- 
tween the concept of spectral coherence and the fact that two 
frames are obtained with a rearrangement of their elements; 
according to the specific needs in the applications, the detec- 
tion criteria can be based on variations of the property (|6) to 
take into account different definitions of spectral coherence: 
for example, considering a set of admissible operations on the 
analysis coefficients in relation with the entropy variation that 
they provide. 



3. ALGORITHMS AND EXAMPLES 

We show here an application of the detection algorithms with 
the measures defined: the first algorithm we analyze has the 
same operations for the K divergence and Renyi informa- 
tion pi: we calculate the spectrogram of a signal with a 
1024-samples Hamming window, 768-samples overlap and 
2048-points FFT size; we obtain a mean spectrum taking the 
first 20 analysis frames, and calculate the divergence of the 
next frame with respect to the mean spectrum. Once we have 
the first divergence value, we shift the mean spectrum of one 
analysis frame and consider the following 20 frames, then 
calculate the divergence between the new mean spectrum and 
the following frame. At this point, if the ratio between the last 
divergence value and the previous exceeds a certain thresh- 
old, a change is detected at the incoming frame; otherwise 
the procedure goes on. The second algorithm is a variation of 
the first one based on entropy prediction: once obtained the 
spectrogram of the signal, we calculate the Renyi entropy of 
the vector composed of its first 6 analysis frames; then we 
consider the next frame and set the predicted entropy value 
according to |6]). We calculate the actual entropy of the vector 
obtained adding the new frame to the previous ones, and if 
the ratio between this value and the predicted one exceeds a 
certain threshold, a change is detected. Then the procedure 
goes on as in the previous case. 

The Renyi prediction shows a slightly better accuracy at 
the price of a higher computational cost; this is due to the 
larger dimensions of the vectors managed in the entropy cal- 
culus. The tuning of the a parameter gives interesting re- 
sults: as seen in figure [T] higher values rise the difference 
between the entropies of a peaky distribution and a flat one; 
thus we expect in general a more refined detection increasing 
a, leaving the threshold unchanged. The signal we analyze is 
a speech fragment of a mail voice in French language, Veniti- 
enne et lui suce la bouche un quart d'heure. We assume two 
references: an automatic phoneme segmentation for French 
language based on Hidden Markow Model [7|, and a voiced- 
unvoiced classification obtained with a PSOLA-based algo- 
rithm [8]: they identify the major spectral changes in this 
kind of signal, so we expect our detection to confirm them. 
We are not interested in whether a marker belongs to one se- 
lection or the other, as this could be established in a later clas- 
sification step. As we see at the top of figure [2] the Renyi 
prediction with a — 0.2 identifies all the voiced-unvoiced 
transitions in both senses except at time 2.5, and a large part 
of phonemes. If we need a less refined detection, setting the 
a parameter to 0.05 (bottom of figure |2| preserves the detec- 
tion of all the unvoiced-voiced transitions, while discarding 
all the phonemes and the voiced-unvoiced transitions. Both 
the measures provide a better detection with respect to the 
K divergence, which shows a higher number of unexpected 
markers. 
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Fig. 2. Detections obtained with different methods on a speech fragment in French language; cross markers: Renyi entropy 

prediction method, on top with a = 2, at the bottom with a = 1.1; square markers: K divergence; diamond markers: 

HMM-based phoneme segmentation method; bold line: PSOLA voiced-unvoiced classification, is unvoiced. 
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